Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso

نویسندگان

  • Abhradeep Thakurta
  • Adam D. Smith
چکیده

We design differentially private algorithms for statistical model selection. Given a data set and alarge, discrete collection of “models”, each of which is a family of probability distributions, the goal isto determine the model that best “fits” the data. This is a basic problem in many areas of statistics andmachine learning.We consider settings in which there is a well-defined answer, in the following sense: Suppose thatthere is a nonprivate model selection procedure f , which is the reference to which we compare ourperformance. Our differentially private algorithms output the correct value f(D) whenever f is stableon the input data set D. We work with two notions, perturbation stability and subsampling stability.We give two classes of results: generic ones, that apply to any function with discrete output set; andspecific algorithms for the problem of sparse linear regression. The algorithms we describe are efficientand in some cases match the optimal nonprivate asymptotic sample complexity.Our algorithms for sparse linear regression require analyzing the stability properties of the popularLASSO estimator. We give sufficient conditions for the LASSO estimator to be robust to small changesin the data set, and show that these conditions hold with high probability under essentially the samestochastic assumptions that are used in the literature to analyze convergence of the LASSO.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differentially Private Model Selection via Stability Arguments and the Robustness of the Lasso

We design differentially private algorithms for statistical model selection. Given a data set and a large, discrete collection of “models”, each of which is a family of probability distributions, the goal is to determine the model that best “fits” the data. This is a basic problem in many areas of statistics and machine learning. We consider settings in which there is a well-defined answer, in ...

متن کامل

Increasing the Capacity and PSNR in Blind Watermarking Resist Against Cropping Attacks

Watermarking has increased dramatically in recent years in the Internet and digital media. Watermarking is one of the powerful tools to protect copyright. Local image features have been widely used in watermarking techniques based on feature points. In various papers, the invariance feature has been used to obtain the robustness against attacks. The purpose of this research was based on local f...

متن کامل

Feature Selection in Big Data by Using the enhancement of Mahalanobis–Taguchi System; Case Study, Identifiying Bad Credit clients of a Private Bank of Islamic Republic of Iran

The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining us...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013